Practical 5

Models for areal data

In this practical we are going to fit an areal model. We will:

  • Learn how to fit an areal model in inlabru

  • Learn how to add spatial covariates to the model

  • Learn how to do predictions

  • Learn how to simulate from the fitted model


Libraries to load:

library(dplyr)
library(INLA)
library(ggplot2)
library(patchwork)
library(inlabru)   
library(mapview)

# load some libraries to generate nice map plots
library(scico)

The data

We consider data on respiratory hospitalizations for Greater Glasgow and Clyde in 2007. The data are available from the CARBayesdata R Package:

library(CARBayesdata)

data(pollutionhealthdata)
data(GGHB.IZ)

The pollutionhealthdata contains the spatiotemporal data on respiratory hospitalizations, air pollution concentrations and socio-economic deprivation covariates for the 271 Intermediate Zones (IZ) that make up the Greater Glasgow and Clyde health board in Scotland. Data are provided by the Scottish Government and the available variables are:

  • IZ: unique identifier for each IZ.
  • year: the year when the measurements were taken
  • observed: observed numbers of hospitalizations due to respiratory disease.
  • expected: expected numbers of hospitalizations due to respiratory disease computed using indirect standardisation from Scotland-wide respiratory hospitalization rates.
  • pm10: Average particulate matter (less than 10 microns) concentrations.
  • jsa: The percentage of working age people who are in receipt of Job Seekers Allowance
  • price: Average property price (divided by 100,000).

The GGHB.IZ data is a Simple Features (sf) object containing the spatial polygon information for the set of 271 Intermediate Zones (IZ), that make up of the Greater Glasgow and Clyde health board in Scotland ( Figure 1 ).

Figure 1: Greater Glasgow and Clyde health board represented by 271 Intermediate Zones

We first merge the two dataset and select only one year of data, compute the SME and plot the observed

resp_cases <- merge(GGHB.IZ %>%
                      mutate(space = 1:dim(GGHB.IZ)[1]),
                             pollutionhealthdata, by = "IZ") %>%
  filter(year == 2007) %>%
    mutate(SMR = observed/expected)

ggplot() + geom_sf(data = resp_cases, aes(fill = SMR)) + scale_fill_scico(direction = -1)

Then we compute the adjacency matrix using the functions poly2nb() and nb2mat() in the spdep library. We then convert the adjacency matrix into the precision matrix \(\mathbf{Q}\) of the CAR model. Remember this matrix has, on the diagonal the number of e

library(spdep)

W.nb <- poly2nb(GGHB.IZ,queen = TRUE)
R <- nb2mat(W.nb, style = "B", zero.policy = TRUE)

diag = apply(R,1,sum)
Q = -R
diag(Q) = diag

The model

We fit a first model to the data where we consider a Poisson model for the observed cases.

Stage 1 Model for the response \[ y_i|\eta_i\sim\text{Poisson}(E_i\lambda_i) \] where \(E_i\) are the expected cases for area \(i\).

Stage 2 Latent field model \[ \eta_i = \text{log}(\lambda_i) = \beta_0 + \omega_i + z_i \] where

  • \(\beta_0\) is a common intercept
  • \(\mathbf{\omega} = (\omega_1, \dots, \omega_k)\) is a conditional Autorgressive model (CAR) with precision matrix \(\tau_1\mathbf{Q}\)
  • \(\mathbf{z} = (z_1, \dots, z_k)\) is an unstrictured random effect with precision \(\tau_2\)

Stage 3 Hyperparameters

The hyperparameters of the model are \(\tau_1\) and \(\tau_2\)

NOTE In this case the linear predictor \(\eta\) consists of three components!!

Tip Question

Fit the above model in using inlabru by completing the following code:

cmp = ~ Intercept(1) + space(...) + iid(...)

formula = ...


lik = bru_obs(formula = formula, 
              family = ...,
              E = ...,
              data = ...)

fit = bru(cmp, lik)
cmp = ~ Intercept(1) + space(space, model = "besag", graph = Q) + iid(space, model = "iid")

formula = observed ~ Intercept + space + iid

lik = bru_obs(formula = formula, 
              family = "poisson",
              E = expected,
              data = resp_cases)

fit = bru(cmp, lik)

After fitting the model we want to extract results.

Tip Question
  1. What is the estimated value for \(\beta_0\)?

  2. Look at the estimated values of the hyperparameters using

fit$summary.hyperpar

which of the two spatial components (structured or unstructured) explains more of the variability in the counts?

We now look at the predictions over space.

Tip Question

Complete the code below to produce prediction of the linear predictor \(\eta_i\) and of the risk \(\lambda_i\) and of the expected cases \(E_i\exp(\lambda_i)\) over the whole space of interest. Then plot the mean and sd of the resulting surfaces.

pred = predict(fit, resp_cases, ~data.frame(log_risk = ...,
                                             risk = exp(...),
                                             cases = ...
                                             ),
               n.samples = 1000)
# produce predictions
pred = predict(fit, resp_cases, ~data.frame(log_risk = Intercept + space,
                                             risk = exp(Intercept + space),
                                             cases = expected * exp(Intercept + space)
                                             ),
               n.samples = 1000)
# plot the predictions

p1 = ggplot() + geom_sf(data = pred$log_risk, aes(fill = mean)) + scale_fill_scico(direction = -1) + ggtitle("mean log risk")
p2 = ggplot() + geom_sf(data = pred$log_risk, aes(fill = sd)) + scale_fill_scico(direction = -1) + ggtitle("sd log risk")
p1 + p2

p1 = ggplot() + geom_sf(data = pred$risk, aes(fill = mean)) + scale_fill_scico(direction = -1) + ggtitle("mean  risk")
p2 = ggplot() + geom_sf(data = pred$risk, aes(fill = sd)) + scale_fill_scico(direction = -1) + ggtitle("sd  risk")
p1 + p2

p1 = ggplot() + geom_sf(data = pred$cases, aes(fill = mean)) + scale_fill_scico(direction = -1)+ ggtitle("mean  expected counts")
p2 = ggplot() + geom_sf(data = pred$cases, aes(fill = sd)) + scale_fill_scico(direction = -1)+ ggtitle("sd  expected counts")
p1 + p2

Getting prediction densities

Posterior predictive distributions, that is \(\pi(y_i^{\text{new}}|\mathbf{y})\) are of interest in many applied problems. The bru() function does not return predictive densities. In the previous step we have computed predictions for the expected counts \(\pi(E_i\lambda_i|\mathbf{y})\). The predictive distribution is then: \[ \pi(y_i^{\text{new}}|\mathbf{y}) = \int \pi(y_i|E_i\lambda_i)\pi(E_i\lambda_i|\mathbf{y})\ dE_i\lambda_i \] where, in our case, \(\pi(y_i|E_i\lambda_i)\) is Poisson with mean \(E_i\lambda_i\). We can achieve this using the following algorith:

  1. Simulate \(n\) replicates of \(g^k = E_i\lambda_i\) for \(k = 1,\dots,n\) using the function generate() which takes the same input as predict()
  2. For each of the \(k\) replicates simulate a new value \(y_i^{new}\) using the function rpois()
  3. Summarise the \(n\) samples of \(y_i^{new}\) using, for example the mean and the 0.025 and 0.975 quantiles.

Here is the code:

# simulate 1000 realizations of E_i\lambda_i
expected_counts = generate(fit, resp_cases, 
                           ~ expected * exp(Intercept + space),
                           n.samples = 1000)


# simulate poisson data
aa = rpois(271*1000, lambda = as.vector(expected_counts))
sim_counts = matrix(aa, 271, 1000)

# summarise the samples with posterior means and quantiles
pred_counts = data.frame(observed = resp_cases$observed,
                         m = apply(sim_counts,1,mean),
                         q1 = apply(sim_counts,1,quantile, 0.025),
                         q2 = apply(sim_counts,1,quantile, 0.975),
                         vv = apply(sim_counts,1,var)
                         )

Let’s now plot the observed data against the predicted

pred_counts %>% ggplot() + geom_point(aes(observed, m)) + 
  geom_errorbar(aes(observed, ymin = q1, ymax = q2)) +
  geom_abline(intercept = 0, slope =1)